Configuring Settings For The Data Source

After opening a data file you have to make a number of settings to make sure that the source data is interpreted and grouped the way you want. These settings are found on the Settings pane at the left.

Input data settings (Delimiters)

The Input Data settings (on the Settings pane at the left) specify how the input data must be interpreted. These settings are different for each data type. For a CSV file, for example, it is important to specify the delimiter that separates data fields. PDF files are already delimited naturally by pages, so the input data settings for PDF files are interpretation settings for text in the file.
For an overview of all options, see: Input Data.

For a CSV File

In a CSV file, data is read line by line, where each line can contain multiple fields, separated by a delimiter. Even though CSV stands for comma-separated values, fields may be separated using any character, including commas, tabs, semicolons, and pipes.
The text delimiter is used to wrap around each field just in case the field values contain the field separator. This ensures that, for example, the field “Smith; John” is not interpreted as two fields, even if the field delimiter is the semicolon.
For an explanation of all the options, see: CSV file Input Data settings.

For a PDF File

PDF files have a clear and unmovable delimiter: pages. So, the Input Data settings are not used to set delimiters. Instead, these options determine how words, lines and paragraphs are detected when you select content in the PDF to extract data from it.
For an explanation of all the options, see: PDF file Input Data settings.

For a database

Databases all return the same type of information. Therefore the Input Data options for a database refer to the tables inside the database. Clicking on any of the tables shows the first line of the data in that table.
If the database supports stored procedures, including inner joins, grouping and sorting, you can use custom SQL to make a selection from the database, using whatever language the database supports.
For an explanation of all the options, see: Database Input Data settings.

For a text file

Because text files have many different shapes and sizes, there are a lot of input data settings for these files. You can add or remove characters in lines if it has a header you want to get rid of, or strange characters at the beginning of your file, for example; you can set a line width if you are still working with old line printer data; etc.
It is important that pages are defined properly. This can be done either by using a set number of lines or using a text (for example, the character “P”), to detect on the page. Be aware that this is not a Boundary setting; it detects each new page, not each new record.
For an explanation of all the options, see: Text file Input Data settings.

For an XML file

XML is a special file format because these file types can have a theoretically unlimited number of structure types. The input data has two simple options that basically determine at which node level a new record is created. You can either select an element type, to create a new delimiter every time that element is encountered, or choose to use the root node. If there is only one top-level element, there will only be one record before the Boundaries are set.
See also: XML File Input Data settings.

Record boundaries

Boundaries are the division between records: they define where one record ends and the next record begins. Using boundaries, you can organize the data the way you want.
You could use the exact same data source with different boundaries in order to extract different information. If, for instance, a PDF file contains multiple invoices, each invoice could be a record, or all invoices for one customer could go into a single record.
Keep in mind that when the data is merged with a template, each record generates output (print, email, web page) for a single recipient.

To set a boundary, a specific trigger must be defined.
The trigger can be a natural delimiter between blocks of data, such as a row in a CSV file or a page in a PDF file.
It can also be something in the data that is either static (for example, the text "Page 1 of" in a PDF file) or changing (a customer ID, a user name, etc).
To define a more complex trigger you could write a script (see Setting boundaries using JavaScript).
A new record cannot start in the middle of a data field, so if the trigger is something in the data, the boundary will be set on the nearest preceding natural delimiter. If for instance in a PDF file the text "Page 1 of" is used as the trigger, the new record starts at the page break before that text.

Data format settings

By default the data type of extracted data is a String, but each field in the Data Model can be set to contain another data type (see Data types). When that data type is Date, Number or Currency, the DataMapper will expect the data in the data source to be formatted in a certain way, depending on the settings.

The default format for dates, numbers and currencies can be set in three places: in the user preferences, in the data source settings, and per field in the Data Model.

By default, the user preferences are set to the system preferences. These user preferences become the default format values for any newly created data mapping configuration. To change these preferences, select Window > Preferences > DataMapper > DataMapper default format (see Datamapper preferences).

Data format settings defined in the data source settings apply to any new extraction made in the current data mapping configuration. These settings are made on the Settings pane; see Settings pane.

Settings for a field that contains extracted data are made via the properties of the Extract step that the field belongs to (see Setting the data type). Any format settings specified per field are always used, regardless of the user preferences or data source settings.

Data format settings tell the DataMapper how certain types of data are formatted in the data source. They don't determine how these data are formatted in the Data Model or in a template. In the Data Model, data are converted to the native data type. Dates, for example, are converted to a DateTime object in the Data Model, and will always be shown as "year-month-day" plus the time stamp, for example: 2012-04-11 12.00 AM.

Data source settings